Method for model-free control of general discrete-time systems

ABSTRACT

A method of developing a controller for general (nonlinear) discrete-time systems, where the equations governing the system are unknown and where a controller is estimated without building or assuming a model for the system. The controller is constructed through the use of a function approximator (FA) such as a neural network or polynomial. This involves the estimation of the unknown parameters within the FA through the use of a stochastic approximation that is based on a simultaneous perturbation gradient approximation, which requires only system measurements (not a system model).

STATEMENT OF GOVERNMENTAL INTEREST

The Government has rights in this invention pursuant to Contract No. N00039-91-0001 awarded by the Department of the Navy.

This is a continuation of copending application Ser. No. 08/073,371 filed on Jun. 4, 1993 now abandoned.

BACKGROUND OF THE INVENTION

The invention relates to the control of complex, general (nonlinear), discrete-time, e.g., physical, socioeconomic or biological, systems in which not only the parameters but the equations governing the system are unknown. More specifically, the invention is a method for estimating a system controller without building or assuming a model for the system. The controller is constructed through the use of a function approximator such as a neural network or polynomial which, in turn, uses a stochastic approximation algorithm. The algorithm is based on a simultaneous perturbation gradient approximation which requires only system measurements, not a system model.

Frequently, a system designer must determine a means to control and regulate a system when there is uncertainty about the nature of the underlying process. Adaptive control procedures have been developed in a variety of areas for such problems (e.g., manufacturing process control, robot arm manipulation, aircraft control, etc.), but these are typically limited by the need to assume that the forms of the system equations are known (and usually linear) while the parameters may be unknown. In complex physical, socioeconomic, or biological systems, however, the forms of the system equations (typically nonlinear), as well as the parameters, are often unknown, making it impossible to determine the control law needed in existing adaptive control procedures.

Numerous current methods use a function approximation (FA) technique to construct the controller for such systems. Associated with any FA will be a set of parameters that must be determined. Popular FAs include, for example, polynomials, multilayered feed-forward neural networks, recurrent neural networks, splines, trigonometric series, radial basis functions, etc.

By results such as the well-known Stone-Weierstrass approximation theorem, it can be shown that many of these FA techniques share the property that they can be used to approximate any continuous function to any degree of accuracy (of course, this is merely an existence result, so experimentation must usually be performed in practice to ensure that the desired level of accuracy with a given FA is being achieved). Each FA technique tends to have advantages and disadvantages, e.g., polynomials have a relatively easy physical interpretability, but the number of parameters to be determined grows rapidly with input dimension or polynomial order.

Others have considered the problem of developing controllers based on FAs when there is minimal information about the system equations. The vast majority of such techniques are indirect control methods in the sense that a second FA is introduced to model the open-loop behavior of the system. This open-loop FA is typically determined from sample input-output "training" data on the system prior to operating the system in closed loop and constructing the controller FA.

It is not possible in this model-free framework to obtain the derivatives necessary to implement standard gradient-based search techniques (such as back-propagation) for estimating the unknown parameters of the FA. Stochastic approximation (SA) algorithms based on approximations to the required gradient will, therefore, be considered. Usually such algorithms rely on well-known finite-difference approximations to the gradient. The finite-difference approach, however, can be very costly in terms of the number of system measurements required, especially in high-dimensional problems such as estimating an FA parameter vector (which may easily have dimension of order 10² or 10³). Further, real-time implementation of finite-difference methods would often suffer since the underlying system dynamics may change during the relatively long period in which measurements are being collected for one gradient approximation.

There is therefore a need for a control procedure that does not require a model for the underlying system.

SUMMARY OF THE INVENTION

The method of the invention uses the system measurements to determine a controller without the need to estimate or assume a separate model for the system. The controller acts as a function, taking as input past and current system information and producing as output a control value to affect future system performance.

Since the method of the invention is generic, it will be presented without regard to which type of FA is to be used, although the invention will be demonstrated using polynomials and neural networks.

The direct control method of the invention does not require any open-loop estimation, instead constructing the one (controller) FA while the system is operating in closed-loop. Thus we avoid the need for open loop "training" data, which may be difficult or expensive to obtain, and the need to estimate perhaps twice as many parameters (for constructing two FAs). Further, the invention offers potential advantages in being better able to handle changes in the underlying system dynamics (since it is not tied to a prior system model) and being more robust in the face of widely varying control inputs (i.e., the indirect approach may perform poorly for closed-loop controls outside of the range of open-loop controls used in the prior identification step).

The invention uses a SA algorithm based on a "simultaneous perturbation" gradient approximation, which is typically much more efficient than the finite-difference SA algorithms in the amount of data required. In particular, the simultaneous perturbation approximation requires only two system measurements versus the hundreds (or more) typically required in finite-difference methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1a and 1b illustrate in block diagram format the direct approximation and self-tuning implementations, respectively, of the method of the invention.

FIG. 2, illustrates the experimental results of the application of the method of the invention to an additive control system.

FIG. 3, illustrates the experimental results of the application of the method of the invention to an affine-nonlinear model.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Consider a general discrete-time state-space system of the form ##EQU1## where φ_(k) (·) and h_(k) (·) are generally unknown nonlinear functions governing the system dynamics and measurement process, u_(k) is the control input applied to govern the system at time k+1, and w_(k), v_(k) are noise terms (not necessarily serially independent or independent of each other). Based on information contained within measurements and controls up to y_(k) and u_(k-1), the goal is to choose a control u_(k) in a manner such that some loss function related to the next measurement y_(k+1) is minimized. Oftentimes, this loss function will be one that compares y_(k+1) against a target value t_(k+1), penalizing deviations between the two. Without sacrificing generality, the focus below will be on this target-tracking control problem, although the method of the invention would apply in other (e.g., optimal control) problems as well, as discussed briefly below.

In the method of the invention, a function approximator (FA) (e.g., neural network, polynomial) will be used to produce the control u_(k). An FA of fixed structure across time (e.g., the number of layers and nodes in a neural network is fixed) will be considered but underlying parameters in the FA (e.g., connection weights in a neural network) will be allowed to be updated. Since past information will serve as input to the control, this fixed structure requires that u_(k) be based on only a fixed number of previous measurements and controls (in contrast to using all information available up to time k for each k). Suppose the "sliding window" of previous information available at time k, say I_(k), contains M previous measurements and N previous controls. Thus when the system is operating without the FA parameters being updated, we have

    I.sub.k ={y.sub.k, y.sub.k-1, . . . , y.sub.k-M+1 ; u.sub.k-1, u.sub.k-2, . . . , u.sub.k-N }

(when the FA parameters are being updated the set I_(k) of M and N previous measurements and controls will contain "test" values, as discussed below.)

Two methods to the problem of controlling the system of equations (1) and (2) in the face of uncertainty about the form of φ_(k) (·) and h_(k) (·) will be considered, as illustrated in FIGS. 1a and 1b, for the target-tracking problem in the important special case where I_(k) ={y_(k) } (for the more general definition of I_(k), as given above, the diagrams would be modified in an obvious way). In the direct approximation method of FIG. 1a, the output of the FA will correspond directly to the elements of the u_(k) vector, i.e. the inputs to the FA will be I_(k) (=y_(k) here) and t_(k+1) and the output will be u_(k). This approach is appropriate, e.g., when both φ_(k) (·) and h_(k) (·) are unknown functions.

In contrast, the self-tuning method of FIG. 1b requires some prior information about the form of φ_(k) (·) and h_(k) (·). In particular, it requires that enough information be available to write u_(k) =π_(k) (f_(k) (I_(k)), t_(k+1)), where π_(k) (·) is a known control law that depends on some unknown function f_(k) (·). As demonstrated below, a very important type of process to which this second method can apply is an affine-nonlinear (i.e., generalized bilinear) system. When prior information associated with knowledge of φ_(k) (·) and h_(k) (·) is available, the self-tuning method of FIG. 1b is often able to yield a controller superior to the direct approximation method of FIG. 1a.

Associated with the FA generating u_(k) will be a parameter vector θ_(k) that must be estimated (e.g., the connection weights in a neural network). It is assumed that the FA structure is fixed across time. Hence the adaptive control problem of finding the optimum control at time k is equivalent to finding the θ_(k) ε^(p), p not a function of k, that minimizes some loss function L_(k) (θ_(k)). A common loss is the one-step-ahead quadratic tracking error,

    L.sub.k (θ.sub.k)=E (y.sub.k+1 -t.sub.k+1).sup.T A.sub.k (y.sub.k+1 -t.sub.k+1)+u.sub.k.sup.T B.sub.k u.sub.k !,              (3)

where A_(k), B_(k) are positive semi-definite matrices reflecting the relative weight to put on deviations from the target and on the cost associated with larger values of u_(k). The invention would also apply with non-quadratic and/or non-target-tracking loss functions. Such functions might arise, e.g., in constrained optimal control problems where we are trying to minimize some cost (without regard to a specific target value) and penalty functions are used for certain values of y_(k+1) and/or u_(k) to reflect problem constraints. For convenience, however, this discussion will illustrate points with the target tracking problem exemplified by equation (3). Note that although equation (3) is a one-time-step error function, much of the adaptive control literature focuses on minimizing a loss function over an infinite horizon. Note also that if the unconditional loss function L_(k) (θ_(k)) were replaced by a conditional loss as is sometimes seen in the control literature (e.g., equation (3)) replaced by an expected tracking error conditional on previous measurements and controls), the same optimal θ_(k) would result; this follows since under modest conditions, E(∂L_(k) ^(cond) /∂θ_(k))=∂E(L_(k) ^(cond))/.differential.θ_(k) =∂L_(k) /∂θ_(k) =0 at the optimal θ_(k), where L_(k) ^(cond) represents the conditional loss.

With a control of the form in FIG. 1a or 1b, the problem of minimizing L_(k) (θ_(k)) implies that for each k we are seeking the globally optimal θ*_(k) such that ##EQU2## Since φ_(k) (·) and h_(k+1) (·) are incompletely known functions, the term ∂L_(k) /∂u_(k), which involves the terms ∂h_(k+1) /∂x_(k+1) and ∂φ_(k) /∂u_(k), is not generally computable. Hence, g_(k) (θ_(k)) is not generally available in either of the methods in FIGS. 1a and 1b.¹ To illustrate this fact, consider a simple scalar-deterministic version of the system of equations (1) and (2). Then, under the standard squared-error loss, ##EQU3## We see that the right-hand side of the above equation contains unknown expressions. The same principles apply in the more general multivariate stochastic version of the model of equations (1) and (2). Thus the standard gradient descent algorithm (e.g., back-propagation) or any other algorithm involving g_(k) (θ_(k)), is not feasible.

Because gradient-descent-type algorithms are not generally feasible in the model-free setting here, the method of the invention uses a stochastic approximation (SA) algorithm of the form

    θ.sub.k =θ.sub.k-1 -a.sub.k (gradient approx.).sub.k (5)

to estimate {θ_(k) }, where θ_(k) denotes the estimate at the given iteration, {a_(k) } is a scalar gain sequence satisfying certain regularity conditions, and the gradient approximation is such that it does not require knowledge of φ_(k) (·) and h_(k) (·) in equations (1) and (2).

Recall that what is being sought is the FA parameter vector at each time point that minimizes L_(k) (θ_(k)), i.e., a θ*_(k) such that g_(k) (θ*_(k))=0. Recall also that since gradient-based search algorithms are not applicable, an SA-based approach will be considered.

A detailed analysis of the SPSA approach to optimization in the classical setting of a time-invariant loss function L(·) and corresponding fixed minimum is discussed in Spall, J. C. 1992!, "Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation," IEEE Trans. Auto. Control, vol. 37, pp. 332-341 (hereafter Spall 1992!). It is shown that the SPSA algorithm has the usual almost sure (a.s.) convergence and asymptotic normality properties of finite-difference SA (FDSA) algorithms of the Kiefer-Wolfowitz form, but that the asymptotic normality result indicates that SPSA can achieve the same level of asymptotic accuracy as FDSA with only 1/p the number of system measurements. This is of particular interest in FAs for nonlinear, multivariate problems since p can easily be on the order of 10² or 10³. Of course, in the control setting here the loss function L_(k) (·) is generally time-varying and hence it can not be automatically assumed that the results of Spall 1992! apply. Therefore, the conditions under which the SPSA estimation error θ_(k) -θ*_(k) converges a.s. to zero as in the time-variant loss setting will be presented.

Unfortunately, it does not appear possible to similarly produce an asymptotic normality result for the time-varying L_(k) (·) setting, which would provide a formal proof that asymptotically SPSA achieves the same level of accuracy as Finite-Difference Stochastic Approximation (FDSA) with only 1/p^(th) the number of measurements for the gradient approximations. This follows from the fact that the limiting mean and variance of the asymptotic normal distribution in Spall 1992! are based on the (fixed) values of the second and third derivatives of the loss function and, of course, such fixed values do not exist in the time-varying setting here. Nevertheless, the proven advantage of SPSA in the fixed loss function case together with the a.s. convergence of SPSA established below for the time-varying L_(k) (·) setting provide ample theoretical evidence of the advantage of SPSA over the standard FDSA approach in our control problem.

In line with equations (4) and (5), the SPSA algorithm has the form

    θ.sub.k =θ.sub.k-1 -a.sub.k g.sub.k (θ.sub.k-1) (6)

where g_(k) (θ_(k-1)) is the simultaneous perturbation approximation to g_(k) (θ_(k-1)). In particular the l^(th) component of g_(k) (θ_(k-1)), l=1, 2, . . . , p, is given by ##EQU4## where

L_(k).sup.(±) are estimated values of L_(k) (θ_(k-1) ±c_(k) Δ_(k)) using the observed y_(k+1).sup.(±) and u_(k).sup.(±), e.g., for L_(k) (θ_(k)) as in (2.2), L_(k).sup.(±) =(y_(k+1).sup.(±) -t_(k+1)) ^(T) A_(k) (y_(k+1).sup.(±) -t_(k+1))+u_(k).sup.(±) T B_(k) u_(k).sup.(±)

u_(k).sup.(±) are controls based on a FA with parameter vector θ_(k) =θ_(k-1) ±c_(k) Δ_(k), where Δ_(k) =(Δ_(k1), Δ_(k2), . . . , Δ_(kp))^(T), with the {Δ_(ki) } independent, bounded, symmetrically distributed (about 0) random variables ∀k, i, identically distributed at each k, with E(Δ_(ki) ⁻²) uniformly bounded ∀k, i (see below for comments on this latter critical! regularity condition),

y_(k+1).sup.(±) are measurements based on u_(k).sup.(±),

{c_(k) } is a sequence of positive numbers satisfying certain regularity conditions (typically c_(k) →0 or c_(k) =c ∀k, depending on whether the system equations are stationary or nonstationary, as discussed below).

The key fact to observe is that at any iteration only two measurements are needed to approximate g_(k) (·) (i.e., the numerators are the same for all p components). This is in contrast to the standard FDSA approach where 2p measurements are needed (i.e., for the l^(th) component of the gradient approximation, the quantity Δ_(k) is replaced by a vector with a positive constant in the l^(th) place and zeroes elsewhere). A variation on the form in equation (7) is to average several gradient approximations, with each vector in the average being based on a new (independent) value of Δ_(k) and a corresponding new pair of measurements; this will often enhance the performance of the algorithm. (Note that at each k the process and measurement functions φ_(k) (·), h_(k+1) (·) are assumed to remain stationary during the period in which g_(k) (·) is being generated.) A further variation on equation (7) is to smooth across time by a weighted average of the previous and current gradient estimates (analogous to the "momentum" approach in back-propagation); such smoothing can often improve the performance of the algorithm.

There are several ways in which a practical control strategy could be implemented using the SPSA algorithm in equations (6) and (7). These differences result from whether or not it is desired to produce a "nominal" x_(k+1) and y_(k+1) based on a control with updated θ_(k) =θ_(k) (only x_(k+1).sup.(±), y_(k+1).sup.(±) are required in implementing equations (6) and (7) which use θ_(k) =θ_(k-1) ±c_(k) Δ_(k)) and whether or not it is possible to reset the system from a given state value to the previous state value. As an illustration of a strategy for neural network-based control when direct measurement of the states exists, suppose that we wish to produce a sequence of system measurements that includes nominal measurements, i.e., the sequence is {y₀, y₁.sup.(+), y₁.sup.(-), y₁, y₂.sup.(+), y₂.sup.(-), y₂, . . . }, and that the system cannot be readily reset from one state to the previous state (as is quite common). Then, as discussed above, each of u_(k).sup.(+), u_(k).sup.(-), u_(k) are produced using the previous M measurements and N controls, which comprise the information sets I_(k).sup.(+), I_(k).sup.(-), and I_(k) (say) respectively (note that the elements of I_(k) here will differ from those above where no parameter update is taking place). Thus, for example, if M=N=2, then u_(k).sup.(-) is based on θ_(k) =θ_(k-1) -c_(k) Δ_(k), t_(k+1), and I_(k).sup.(-) ={y_(k+1).sup.(+), y_(k), u_(k).sup.(+), u_(k-1) }.

Let us now show that, as in the fixed loss function case of Spall 1992!, the difference θ_(k) -θ* is almost surely (a.s.) convergent to 0 where θ* is such that θ*_(k) →θ* as k→∞. (Some discussion regarding the non-converging θ*_(k) case is given below.) We will use the following regularity conditions, which are similar to those of Spall 1992!; some discussion is provided below on the conditions relative to the control problem here. Note that having θ*_(k) →θ* does not imply the (say) pointwise convergence of L_(k) (·) to some fixed L(·); in fact L_(k) (θ*_(k)) may be perpetually varying even when θ*_(k) =θ*∀k, as discussed below. We let ∥·∥ denote any vector norm.

C.0 E(ε_(k).sup.(+) -ε_(k).sup.(-) |θ₁, θ₂, . . . , θ_(k-1) ; Δ_(k))=0 a.s. ∀k, where ε_(k).sup.(±) is the effective SA measurement noise, i.e., ε_(k).sup.(±) ≡L_(k).sup.(±) -L_(k).sup.(±), with L_(k).sup.(±) ≡L_(k) (θ_(k-1) ±c_(k) Δ_(k)).

C.1 a_(k), c_(k) >0 ∀k; a_(k) →0, c_(k) →0 as k→∞; Σ_(k=1).sup.∞ a_(k) =∞, Σ_(k=1).sup.∞ (a_(k) /c_(k))² <∞.

C.2 For some ρ>0 and ∀k, E(ε_(k).sup.(±))² ≦ρ, E(L_(k) (θ_(k-1) ±c_(k) Δ_(k))²)≦ρ, E(Δ_(kl) ⁻²)≦ρ, |Δ_(kl) |≦ρ, and Δ_(kl) is symmetrically distributed about 0 (l=1,2, . . . , p).

C.3 For some ρ>0, there exists a K<∞ such that ∀k≧K the function L_(k) (·) is continuously thrice differentiable with uniformly (in k) bounded third derivative for all θ such that ∥θ_(k-1) -θ∥≦ρ and almost all θ_(k-1).

C.4 For any ρ>0 and for each k≧1 suppose that if ∥θ-θ*∥≧ρ, there exists a δ_(k) (ρ)>0 such that (θ-θ*)^(T) g_(k) (θ)≧δ_(k) (ρ) where δ_(k) (ρ) satisfies ##EQU5## and c_(k) ² δ_(k) (ρ)⁻¹ →0. Proposition. Let conditions C.0-C.4 hold and suppose there exists a θ* such that θ*_(k) →θ* as k→∞. Then

    θ.sub.k -θ*→0 a.s.                      (8)

Proof. First, from C.0, C.2, and C.3, it can be shown that for all k sufficiently large

    E g.sub.k (θ.sub.k-1)|θ.sub.k-1 !=g.sub.k (θ.sub.k-1)+b.sub.k,                                (9)

where c_(k) ⁻² ∥b_(k) ∥ is uniformly bounded a.s. Now, to show equation (8), we demonstrate that the multivariate analogues of conditions (i)-(iv) of Evans, S. N. and Weber, N. C. 1986!, "On the Almost Sure Convergence of a General Stochastic Approximation Procedure," Bull. Austral. Math. Soc., vol. 34, pp. 335-342, apply. with respect to (i) we have by equation (9), C.4, and Thm. 4.2.2 of Chung, K. L. 1974!, A Course in Probability Theory, Academic, New York, that for any ρ>0,

    P(∥θ.sub.k-1 ∥≧ε, a.sub.k θ.sub.k-1.sup.T (g.sub.k (θ.sub.k-1)+b.sub.k)<0 i.o.)≦P(θ.sub.k-1.sup.T b.sub.k <-δ.sub.k (ρ) i.o.)→0

where θ_(k-1) ≡θ_(k-1) -θ* and i.o. denotes infinitely often, which shows (i). Then from the fact that a_(k) →0 and facts that C.3 and equation (9) indicate that ##EQU6## we know that (ii) holds. The fact that ##EQU7## follows easily by C.1 and C.2, which shows (iii). Finally, with respect to (iv), we must show ##EQU8## For any sample point in the underlying sample space such that liminf_(k)→∞ ∥θ_(k-1) ∥>0 take 0<ρ<liminf_(k)→∞ ∥θ_(k-1) ∥. Then ∃ a k₁ (ρ)<∞ such that ∥θ_(k-1) ∥≧ρ∀k≧k₁, which by C.4 indicates that ∃ a δ'_(k) (ρ)>0 such that ∥g_(k) (θ_(k-1))∥≧δ'_(k) (ρ)∀k≧k₁. Further, by the uniform bound on c_(k) ⁻² ∥b_(k) ∥ we have from C.4 (except, of course, on a set of measure 0) that ∃ a k₂ (ρ)<∞ such that ∥b_(k) ∥≦δ'_(k) (ρ)/2∀k≧k₂. Hence for this sample point, we have ##EQU9## by C.1 and C.4 where k*=max {k₁,k₂ }. Since ∃ such a k* and δ'_(k) for almost all sample points such that liminf_(k)→∞ ∥θ_(k-1) ∥>0, this shows (3.4) (and hence (iv)), which completes the proof. Q.E.D.

Conditions C.0-C.4 will now be commented on. Condition C.0 is critical in ensuring that g_(k) (·) is an unbiased estimator of g_(k) (·) to within an O(c_(k) ²) bias, where c_(k) →0 by C.1. Condition C.1 presents some standard conditions on the SA gains (as discussed below, however, in a system with nonstationary dynamics--where θ*_(k) does not converge--it is generally best to not satisfy this condition). C.2 ensures that the variability of g_(k) (·) is not so large as to potentially cause divergence of the algorithm; the boundedness condition on E(Δ_(kl) ⁻²) is particularly important for the validity of g_(k) (·) as an estimator of g_(k) (·) and for the convergence of θ_(k), and prevents, e.g., taking Δ_(kl) as uniformly or normally distributed (taking Δ_(kl) as symmetrically Bernoulli distributed satisfies this condition and has proven effective in numerical studies). C.3 is used to ensure that L_(k) (·) is sufficiently smooth so that g_(k) (·) is nearly an unbiased estimator of g_(k) (·) based on a third order expansion of L_(k) (·) about θ_(k-1). Of course, C.3 translates into differentiability conditions on φ_(k) (·) and h_(k) (·) in the system model (equations (1) and (2)) and, with the self-tuning method of FIG. 1b, into differentiability conditions on the control law π_(k) (·). C.4 ensures that if we are at a value θ_(k-1) not at θ*, the gradient g_(k) (·) is sufficiently steep (as well as pointing towards θ*) so that there will be a tendency to push the next value θ_(k) towards θ*.² Note that the nonuniformity (in k) that is allowed for δ_(k) (ρ) permits L_(k) (·) to "flatten out" in some fixed region around θ* as k gets large provided that this flattening does not occur too fast.

The assumption in the Proposition that there exists a θ* such that θ*_(k) →θ* is, in fact, quite modest given that the even stronger assumption that θ*_(k) =θ*∀k holds in many applications, including settings where L_(k) (θ) is perpetually varying at any θ (as results from, among other things, a time-varying target t_(k)). In particular, when φ_(k) (·)=φ(·) and h_(k) (·)=h(·) ∀k, the control law can generally be expressed as u_(k) =u(·) for some u(·) not indexed by k. Then, there will generally exist a unique θ* that yields the best possible approximation to u(·) (this is irrespective of whether L_(k) (·) is time varying). This is true in either the case where the FA represents u(·) directly (FIG. 1a) or in the self-tuning case of partial prior information where the FA is used to approximate some f_(k) (·)=f(·) in a predetermined control law (FIG. 1b). The more general condition of the Proposition, θ*_(k) →θ*, allows for a system with transient effects (i.e., one where φ_(k) (·)→φ(·) and h_(k) (·)→h(·) in a pointwise manner).

Let us now illustrate how one of the main regularity conditions, C.0, can be satisfied in a general control problem (the other regularity conditions, C.1-C.4, are fairly natural in terms of most search algorithms or are directly dependent on user-specified quantities). Suppose that the system (equations (1) and (2)) evolves according to

    x.sub.k+1 =φ.sub.k (x.sub.k,u.sub.k)+w.sub.k,

    y.sub.k =x.sub.k +v.sub.k

where {w_(k) } and {v_(k) } are i.i.d. sequences with finite second moments (v_(k) and w_(k+1) may be dependent) and φ_(k) (·) is some (unknown) function. Further assume that we are using the practical control strategy described above, where we generate the sequence of measurements {. . . y_(k+1).sup.(+), y_(k+1).sup.(-), y_(k+1), . . . } without resetting the system from one state to the previous state. It can be readily shown that C.0 would also hold in other implementation strategies as well (such as where the state can be reset). Then according to the definition of ε_(k).sup.(±) in C.0, and using the standard quadratic loss function in equation (3), we can write (assuming without loss of generality that B_(k), E(w_(k)), E(v_(k)), and t_(k+1) are identically 0∀k and that x_(k), y_(k) ε¹ with A_(k) =1 ∀k),

    ε.sub.k.sup.(+) -ε.sub.k.sup.(-) =(φ.sub.k.sup.(+) +w.sub.k.sup.(+) +v.sub.k+1.sup.(+)).sup.2 -(φ.sub.k.sup.(-) +w.sub.k.sup.(-) -E (φ.sub.k +w.sub.k +v.sub.k+1).sup.2 |θ.sub.k =θ.sub.k-1 +c.sub.k Δ.sub.k !+E (φ.sub.k +w.sub.k +v.sub.k+1).sup.2 |θ.sub.k =θ.sub.k-1

where

    φ.sub.k.sup.(+) =φ.sub.k (x.sub.k,u.sub.k (θ.sub.k-1 +c.sub.k Δ.sub.k,t.sub.k+1, I.sub.k.sup.(+)))

    φ.sub.k.sup.(-) =φ.sub.k (x.sub.k+1.sup.(+),u.sub.k (θ.sub.k-1 -c.sub.k Δ.sub.k,t.sub.k+1,I.sub.k.sup.(-)))

and w_(k).sup.(±), v_(k+1).sup.(+) are the corresponding noises. Then by

    E(φ.sub.k.sup.(+)2 |θ.sub.1, . . . , θ.sub.k-1 ; Δ.sub.k)=E E(φ.sub.k.sup.2 |x.sub.k, I.sub.k.sup.(+), θ.sub.k =θ.sub.k-1 +c.sub.k Δ.sub.k)|θ.sub.k-1, Δ.sub.k !=E(φ.sub.k.sup.2 |θ.sub.k =θ.sub.k-1 +c.sub.k Δ.sub.k),

with obvious parallel arguments for φ_(k).sup.(-), we have

    E(ε.sub.k.sup.(+) -ε.sub.k.sup.(-) |θ.sub.1, . . . , θ.sub.k-1 ; Δ.sub.k)=E(ε.sub.k.sup.(+) -ε.sub.k.sup.(-) |θ.sub.k-1 ; Δ.sub.k)=0.

This shows C.0.

Let us close with a brief discussion of the constant gain setting where we take a_(k) =a>0 and/or c_(k) =c>0∀k. It is known that SA algorithms with such gains are better able to track a time-varying root (θ*_(k)) than the usual decaying gain algorithms, which is relevant when θ*_(k) is non-convergent. This is likely to occur, e.g., when the process dynamics φ_(k) (·) and/or measurement apparatus h_(k) (·) are perpetually time-varying due to cyclic behavior or when they change due to a failure or wear and tear of components in the system. In fact, because of their ease of use, such constant gains are sometimes applied in SA (or SA-type) algorithms even when θ*_(k) =θ*∀k, although it is known that they preclude the formal a.s. convergence of decaying gain algorithms.

What follows presents the results of studies on three different nonlinear models, one that is a stochastic generalization of a model in Narendra, K. S. and Parthasarathy, K. 1990!, "Identification and Control of Dynamical Systems Using Neural Networks," IEEE Trans. Neural Nets., vol. 1, pp. 4-26, (N & P hereafter), one that is a stochastic, multivariate generalization of a model in Chen, F. C. 1990!, "Back-Propagation Neural Networks for Nonlinear Self-Tuning Adaptive Control," IEEE Control Syst. Mag., April, pp. 44-48, (hereafter Chen 1990!) and one that is a model for waste-water treatment in Dochain, D. and Bastin, G. 1984!, "Adaptive Identification and Control Algorithms for Nonlinear Bacterial Growth Systems," Automatica, vol. 20, pp. 621-634 (hereafter Dochain and Bastin 1984!). The studies include a comparison of the SPSA and FDSA parameter estimation algorithms and a comparison of the direct approximation (DA) (FIG. 1a) and self-tuning (ST) (FIG. 1b) controllers.

All studies here are based on A_(k) =I and B_(k) =0 in the loss function equation (3) (i.e., a minimum variance regulator). Further, to facilitate having a fair evaluation of the weight estimation algorithms, we produce a nominal measurement (y_(k)) at every iteration, as discussed above. The performance of the various techniques will be evaluated by comparing the root-mean-square (RMS) tracking error of the nominal states as normalized by the dimension of y_(k), i.e., RMS at time k is (y_(k) -t_(k))^(T) (y_(k) -t_(k))/dim (y_(k))!^(1/2). For the FA, we will use a feedforward neural network that has an input layer, two hidden layers, and an output layer, as in N & P and Chen 1990! (we also briefly consider a polynomial FA at the end of the section). The hidden layer nodes are hyperbolic tangent functions (i.e., (e^(x) -e^(-x))/(e^(x) +e^(-x)) for input x) while the output nodes are linear functions (simply x). Each node takes as an input (x) the weighted sum of outputs of all nodes in the previous layer plus a bias weight not connected to the rest of the network (hence an N₄,20,10,2 network, in the notation of N & P, has 100+210+22=332 weights to be estimated). For the weight estimation, we will consider different forms of the SPSA algorithm, denoted SPSA-q, where q denotes the number of individual gradient approximations of the form in equation (7) that are averaged to form g_(k) (·) (hence an SPSA-q algorithm uses 2q measurements to form g_(k) (·)). For the SPSA algorithms we take the perturbations Δ_(ki) to be Bernoulli ±1 distributed, which satisfies the relevant regularity conditions discussed above.

In the first study (additive control system), the model used for producing measurements is a generalization of the two-dimensional model with additive control given in N & P to include additive (independent) noise, i.e.,

    x.sub.k+1 =f(x.sub.k)+u.sub.k +w.sub.k, w.sub.k ˜N(0, σ.sup.2 I), y.sub.k =x.sub.k                                      (11)

where, as in eqn. (18) of N & P, ##EQU10## with x_(ki) the i^(th) (i=1,2) component of x_(k) (the estimation algorithm here does not assume knowledge of f(x_(k)) or the form in equation (11)). Further we take the two-dimensional target sequence {t_(k) } to be as given in FIG. 31 of N&P. FIG. 2 presents the main results for our study of the model in equation (11) based on the practical control procedure discussed above (including the generation of the optional nominal state for purposes of plotting the RMS error). We use the DA method of control (FIG. 1a) with I_(k) ={y_(k) }. The RMS curves in the figures are based on the sample mean of four independent runs with different initial weights θ₀, where the elements of θ₀ were generated randomly from a uniform (-0.1, 0.1) distribution (the same four initial weight vectors were used for the three different curves). To further smooth the resulting error curves and to show typical performance (not just case-dependent variation), we applied the MATLAB low-pass interpolating function INTERP to the error values based on the average of four runs. The curves shown in FIG. 2 are based on this combination of across-realization averaging and across-iteration interpolation. Each of the curves was generated by using SA gains of the form a_(k) =A/k⁰.602 and c_(k) =C/k⁰.101 with A, C>0 (the exponents were chosen to satisfy the SA conditions discussed above). For each SA algorithm we attempted to tune A and C to maximize the rate of decay in RMS error (as would typically be done in practice)³ ; the values satisfied 0.02≦A≦0.08 and 0.10≦C≦0.25. The value x₀ was set to (1.5, 1.5)^(T) and σ=0.50, so the initial RMS error is 1.5 and the minimum achievable long-run RMS error is 0.50.

FIG. 2 shows that both the SPSA and FDSA algorithms yield controllers with decreasing RMS tracking error over time. The long-run performance of SPSA-4 and FDSA is virtually identical, with SPSA-4 and FDSA achieving terminal RMS errors of 0.50 and 0.51 respectively (vs. the theoretical limit of 0.50); for the SPSA-1 algorithm the terminal error was 0.57. The critical observation to make here is that the SPSA algorithms achieved their performance with a large savings in data: each iteration of SPSA-1 and SPSA-4 required only two measurements and eight measurements, respectively, while each iteration of FDSA needed 664 measurements. Hence FIG. 2 illustrates that SPSA-4 yielded approximately the same level of tracking error as the standard FDSA algorithm with an 83-fold savings in state measurements. The data savings seen in FIG. 2 is typical of that for a number of other studies involving SPSA and FDSA that we have conducted on the model of equation (11) as well as on other nonlinear models; in fact, even greater data savings are typical with more complex networks (as might be needed in higher-dimensional systems or in systems where u_(k) is not simply additive). Note also that the curves in FIG. 2 have the typical shape of many optimization algorithms in that there is a sharp initial decline followed by a slow decline. Hence over 65 percent of the possible reduction in RMS error, which may be all that is required in some applications, is achieved in SPSA-4 with fewer than 100 iterations.

The model considered for our second study has the widely studied affine-nonlinear form

    x.sub.k+1 =f.sup.(1) (x.sub.k)+f.sup.(2) (x.sub.k)u.sub.k +w.sub.k

    y.sub.k =x.sub.k                                           (12)

(see, e.g., Chen 1990! for discrete time or Nijmeijer, H. and van der Schaft, A. J. 1990!, Nonlinear Dynamical Control Systems, Springer-Verlag, New York.(hereafter Nijmeijer and van der Schaft 1990!) for continuous time), where f.sup.(1) and f.sup.(2) are a vector and matrix respectively representing unknown system dynamics. The primary goal in this study is to compare the DA and ST methods of control, although we also present a further comparison of SPSA and FDSA.

We will take x_(k) ε² and for purposes of generating the data, take (as before) w_(k) ˜N(0, σ² I) (and independent), x₀ =(1.5, 1.5)^(T), and ##EQU11## Similar to Chen 1990!, the two components of the target sequence will be identical and will be periodic with one period being a 50 iteration sine wave followed by a 50 iteration square wave with both waves having amplitude 1.2. As in the first study, we used decaying SA gains, which were tuned to approximately optimize the performance of the given algorithm. For the ST controller we take

    u.sub.k =π(f(x.sub.k),t.sub.k+1)=f.sup.(2) (x.sub.k).sup.-1 (t.sub.k+1 -f.sup.(1) (x.sub.k)), f=(f.sup.(1), f.sup.(2)),

and, analogous to Chen 1990!, we model f(·) by an N₂,20,10,4 neural network for the f.sup.(1) part and the two unknown diagonal elements of f.sup.(2) (this implementation assumes that the off-diagonal elements are known to be 0 and the f.sup.(2) values are scaled away from zero). For the DA controller, of course, we assume no knowledge of the structure in equations (12) and (13) and use an N₄,20,10,2 network with I_(k) ={y_(k) }.

FIG. 3 presents the results for our first study of the affine-nonlinear model; this study uses the SPSA-1 algorithm and the practical control algorithm discussed above (the same averaging and interpolation used in FIG. 2 was used here). As with the studies above the initial weights θ₀ were determined randomly according to a uniform (-0.1, 0.1) distribution, except for the two bias weights corresponding to f.sup.(2) in the ST method. These were initialized differently to reflect a varying level of prior information about the system: in the curve labeled "good initial weights" these two biases were set to 5.0 and in the curve labeled "poor initial weights" the two biases were set to 2.0. The ST method is quite sensitive to the prior information, but can outperform DA if the information is good (the terminal RMS error for the good ST run is 0.26 versus a theoretical limit of 0.25 and the DA value of 0.30). On the other hand, DA outperforms ST when the initial conditions for ST are poor. This result seems to follow from the inherent instability of x_(k) due to the presence of a matrix inverse (f.sup.(2))⁻¹ in the ST controller. To avoid the instability associated with the near-singularity of f.sup.(2) that sometimes occurred when the initial conditions were poor, the SA coefficient A needed to be set to a level so small that the convergence of θ_(k) was inhibited. This helps explain the difference between the two ST curves in FIG. 3. In contrast, the DA method proved to be more numerically stable since it did not involve matrix inversion.

To verify that this phenomenon was not unique to the set-up above, studies for a number of other affine-nonlinear models and/or other neural network configurations were conducted, and it was consistently found that ST outperformed DA only when it had good initial weights on the f.sup.(2) portion of the NN output. This study has indicated that even when enough information for a given system is available to implement the ST method, it may still be valuable to examine the performance of the DA method. This appears to be especially true when the control functional π(·) is such that x_(k) may be very sensitive to deviations from the true f(·) in the control.

Our other numerical study for equations (12) and (13) illustrates the relative performance of SPSA and FDSA in a deterministic (σ=0) system (to complement the stochastic comparison in the study for equation (11)). As with equation 11, the DA control method was used. The mean and terminal RMS errors for SPSA-1 were, respectively, 0.11 and 0.08 versus 0.13 and 0.10 for FDSA. Thus SPSA outperforms FDSA with less than 1/300 the number of required system measurements.

The final study presents some preliminary results related to a model for waste-water treatment in Dochain and Bastin 1984! (D&B). This is a model of affine-nonlinear form where the aim is to control the output substrate concentration based on adjusting the dilution rate at the input to a fermentation process. The results below are based on producing system measurements according to eqns. (8) and (9) of D&B, where the parameters in these equations are taken as in the simulations of D&B. This model has time-varying dynamics (from FIG. 6 of D&B) and a time-varying target (from FIG. 4 of D&B).

Two types of DA controls were implemented, one based on a neural network and one based on a third-order polynomial. The SPSA-1 algorithm with constant gains (a_(k) =a, c_(k) =c) was considered to account for the time-varying dynamics, as discussed above. The minimum achievable RMS error for this study was 0. For the neural network controller, the RMS error reduces in 20K iterations from 0.62 to 0.006 while for the polynomial controller the reduction is from 0.62 to 0.009. As with the models above, there is a steep initial decay in the RMS errors, with approximately 70 percent of the possible decay occurring within the first 100 iterations. The above results offer promise for future study of such waste treatment models.

A practical implementation of the method of the invention is for optimal traffic control that is able to automatically adapt to both daily non-recurring events (accidents, temporary lane closures, etc.) and to long-term evolution in the transportation system (seasonal effects, new infrastructure, etc.).

Through use of an advanced transportation management system (ATMS), that includes sensors and computer-based control of traffic lights, a municipality seeks to more effectively use the infrastructure of the existing transportation network, thereby avoiding the need to expand infrastructure to accommodate growth in traffic. A major component of ATMS is to develop means to optimally control traffic flow in real-time.

There are, of course, numerous challenges to achieving optimal traffic flow in a complex transportation system. These include being able to accommodate accidents and other unpredictable events in real-time and being able to accommodate long-term changes in the transportation system without having to do a manual recalibration of the control algorithm in the ATMS software. It appears that the vast majority of focus in ATMS thus far has been on the hardware (sensors, detectors, and other surveillance devices) and data processing aspects. In fact, however, the advances in these areas will be largely wasted unless they are coupled with appropriate analytical techniques for adaptive control. This analytical aspect is the problem to be solved by the invention.

The approach to traffic control will be based on applying the inventive technique for adaptive control discussed above using neural networks (NNs) as the FA. NNs are mathematical function approximators that are patterned loosely after the design of the human brain. They have a unique ability to "learn" about and to improve the complex processes they control, in much the same way that scientists believe the human brain learns. Control of a large transportation system, with different modes (auto, transit, bicycle, etc.) and with hundreds of interrelated traffic lights and arterial flows, seems to be an ideal candidate process for a NN application.

Essential to the use of NNs in adaptive control is the estimation of the connection weights in the NN. Estimating the weights corresponds to the above-mentioned learning process. In the adaptive control approach here, the weights of the NN are estimated while the system is being controlled. The weight estimation is accomplished using the stochastic approximation algorithm that is based on a simultaneous perturbation gradient approach of the invention.

The invention can be successfully applied to automatically control traffic signals so as to achieve optimal flow conditions on a network-wide basis. Inputs to the NN-based controller would include real-time measurements of traffic flow conditions, previous signal control settings, and desired flow levels for the different modes of transportation. Th NN controller would be trained with weights established using the simultaneous perturbation stochastic approximation (SPSA) methodology of the invention. The resulting control algorithm would adjust to changing flow patterns and provide a robust capability to accommodate accidents and other non-recurring events.

A major difference between the invention and other transportation control algorithms that are used or have been proposed (including those with NNs), is that the SPSA-based approach does not require the development of models describing system flow dynamics. The NN is used only for the controller (i.e., no NN or other mathematical model is needed for the system dynamics). This allows for the control algorithm to more readily adjust to long-term changes in the transportation system. As noted by Rowe, E. (1991), "The Los Angeles Automated Traffic Surveillance and Control System," IEEE Trans. on Vehicular Tech., vol. 40, pp. 16-20, it is a very expensive and labor intensive job to periodically recalibrate the system model to accommodate the system evolution (and hence is not done as frequently as it should be in many jurisdictions). Since the invention does not require a system model, this problem is avoided. Furthermore, the invention avoids the seemingly hopeless problems of (1) attempting to mathematically model human behavior in a transportation system (e.g., modelling how people would respond to radio announcements of an accident situation) and (2) of modelling the complex interactions among flows along different arteries.

The application of NNs to traffic control has been proposed and examined by others. It is the use of the SPSA methodology of the invention to train and continually adjust the NN weights that is unique to this approach and is critical to the successful development of a NN-based control mechanism that readily adapts to changes in the transportation network without building a model (NN or otherwise) of the traffic flow dynamics.

The method of the invention solves the problem of controlling nonlinear stochastic systems with unknown process and measurement equations. The invention differs from other methods in that it does not require an explicit model for the open-loop behavior of the system and, consequently, does not require a prior system identification step. Rather, the invention fills the void left by the current unavailability of methods for directly adjusting the control parameters based on the output error (between the plant and reference target! outputs). The invention encompasses two different methods--direct approximation (DA) control and self-tuning (ST) control--where DA applies when there is very little information available about the system dynamics and ST applies when some (still incomplete) information is available.

Since full knowledge of the structure of the equations describing the system is not assumed, it is not possible to calculate the gradient of the loss function for use in gradient-descent-type search algorithms. Therefore, a stochastic approximation-based method for the estimation of the controller is described, which is based on a "simultaneous perturbation" approximation to the unknown gradient. This method relies on observing the system state at two (or a low multiple of two) levels of the control to construct the gradient approximation. Both theoretical and empirical evidence indicate that this SPSA method for weight estimation is much more efficient (in terms of the number of system measurements needed) than more standard Kiefer-Wolfowitz-type SA methods based on finite-difference approximations to the gradient. 

We claim:
 1. A method for approximating a controller for a nonlinear, discrete-time, closed-loop, stochastic system wherein the nonlinear functions governing the system dynamics and measurement process are unknown, the method comprising the steps of:obtaining information about the system; inputting the information about the system into a data processing means; approximating the controller using the data processing means and the information about the system comprising the steps of:selecting a single function approximator to approximate directly the controller; estimating the unknown parameters of the single function approximator in the controller using a stochastic approximation algorithm; and using the single function approximator to approximate the controller, wherein an output of the single function approximator is the controller; and using the controller to control the system.
 2. The method as recited in claim 1, the selecting a single function approximator step comprising the step of selecting a single continuous function approximator to approximate the controller.
 3. The method as recited in claim 1, the estimating the unknown parameters step comprising the step of estimating the unknown parameters of the single function approximator in the controller using a simultaneous perturbation stochastic approximation algorithm.
 4. The method as recited in claim 3, the selecting a single function approximator step comprising the step of selecting a neural network to approximate the controller.
 5. The method as recited in claim 4, the selecting a neural network step comprising the step of selecting a multilayered, feed-forward neural network to approximate the controller.
 6. The method as recited in claim 4, the selecting a neural network step comprising the step of selecting a recurrent neural network to approximate the controller.
 7. The method as recited in claim 3, the selecting a single function approximator step comprising the step of selecting a polynomial to approximate the controller.
 8. The method as recited in claim 3, the selecting a single function approximator step comprising the step of selecting a spline to approximate the controller.
 9. The method as recited in claim 3, the selecting a single function approximator step comprising the step of selecting a trigonometric series to approximate the controller.
 10. The method as recited in claim 3, the selecting a single function approximator step comprising the step of selecting a radial basis function to approximate the controller. 